home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98c.txt
/
000012_icon-group-sender _Fri Sep 11 08:21:25 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id IAA24270
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 11 Sep 1998 08:21:24 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA31078; Fri, 11 Sep 1998 08:20:57 -0700
To: icon-group@optima.CS.Arizona.EDU
Date: Fri, 11 Sep 1998 09:22:34 +0900
From: Eric Hildum <Eric.Hildum@japan.ncr.com>
Message-Id: <35F86D49.9BDF7813@Japan.NCR.COM>
Organization: NCR Japan
Sender: icon-group-request@optima.CS.Arizona.EDU
References: <35F723CF.76B3CC97@Japan.NCR.COM>, <6t9b4o$8rs$1@ringer.cs.utsa.edu>
Subject: Re: Unicode support or support for non-Ascii based character manipulation?
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Clinton Jeffery wrote:
> Eric Hildum (Eric.Hildum@Japan.NCR.COM) wrote (and I paraphrase/edited):
> : Icon ... supporting only ASCII makes it less useful for non-English language
> : With Unicode... it should be possible to begin including support for
> : non-English and non alphabetic languages.
>
> : Has anyone thought about this yet? What does string and pattern matching
> : mean in, for example, Japanese?
>
> 1. Other folks have been thinking about it, especially Icon users in Asia.
> For example, a Chinese version of Icon has been done by researchers in China.
Glad to hear it.
>
>
> 2. Going to Unicode might not be *that* difficult, but I think Unicode isn't
> really as widely adopted as you might suggest. Many people seem to be using
> mixed 8/16-bit strings.
Windows NT, Macintosh, use Unicode. Unix is still EUC.
>
>
> 3. The semantics of string and pattern matching are no different in Japanese
> than in English. There is nothing specific to language or grammar in the Icon
> string and pattern matching repertoire. Of course, when the character set
> changes the actual code needs to change...
That surprises me. Given the above comment about mixed 8/16 bit, I would expect
you already would have run into the half width/full width character issue. How did
you handle it?
>
>
> 4. Let's look at the current situation for mixed-character sets. I am not
> sure how Chinese Icon stands on these, but consider plain-old Windows Icon.
> Divide functionality as follows:
> non-alphabetic output: Windows Icon already can do this
> non-alphabetic input: we have known bugs in the input processing
> of these, either in Windows Icon or the IPL "vidgets" code.
> non-alphabetic string scanning: not supported, but could be
> implemented as Icon Program Library procedures. Even
> Unicode string semantics could be implemented as library
> procedures on top of (even length!) Icon strings.
Chinese is probably the easiest double byte language to support. I don't think you
have really considered or solved all the problems until you can support Japanese
(for representation and manipulation) and Korean (for I/O).
>
>
> We don't really need much additional infrastructure. Some folks in the user
> community could coordinate the library procedures to do this as an
> interesting project. We do also need someone who can compile Icon from its
> C code and debug I/O problems on a non-alphabetic platform at this point.
"non-alphabetic platform" hmmm, you haven't got any Chinese or Japanese grad
students on the Icon project have you...
>
>
> --
> Clint Jeffery, jeffery@cs.utsa.edu
> Division of Computer Science, The University of Texas at San Antonio
> Research http://www.cs.utsa.edu/research/plss.html
--
---------------------------
Eric Hildum
Eric.Hildum@Japan.NCR.COM